Modelling Salmonella Typhi in high-density urban Blantyre neighbourhood, Malawi, using point pattern methods

Salmonella Typhi is a human-restricted pathogen that is transmitted by the faecal–oral route and causative organism of typhoid fever. Using health facility data from 2016 to 2020, this study focuses on modelling the spatial variation in typhoid risk in Ndirande township in Blantyre. To pursue this objective, we developed a marked inhomogeneous Poisson process model that allows us to incorporate both individual-level and environmental risk factors. The results from our analysis indicate that typhoid cases are spatially clustered, with the incidence decreasing by 54% for a unit increase in the water, sanitation, and hygiene (WASH) score. Typhoid intensity was also higher in children aged below 18 years than in adults. However, our results did not show evidence of a strong temporal variation in typhoid incidence. We also discuss the inferential benefits of using point pattern models to characterise the spatial variation in typhoid risk and outline possible extensions of the proposed modelling framework.


Distance to the health facility
We calculated the Euclidean distance from each location in Ndirande township to Ndirande health facility.Ndirande health facility is the largest government owned health facility in Ndirande township.The health facility is illustrated as a white star in Figure 2 Figure 2: Distance in meters from every location on the grid to Ndirande Health facility.

Water, sanitation and hygiene (WASH) score
A water, sanitation, and hygiene (WASH) study was carried out in Ndirande township in 2018 as part of the STRATAA study.A total of 14,136 households were sampled in the study.Households were asked several questions related to their WASH and economic levels.Some of the questions asked to these households included: • The number of rooms a house has (continuous variable).
• The type of toilet used by the house (no toilet facility, toilet shared with other households (public), toilet shared with neighbours and household use only).
• Material of the toilet used by the household (open defecation, pit latrine with a wooden or soil floor, pit latrine with slab, flush or pour toilet).
• The main source of drinking water for a household (borehole and other unprotected sources, public standpipe, piped to the house, protected well or borehole, private tap located outside of the house, public standpipe and public tap outside the house).
A WASH score was derived from the above questions using Principal Components Analysis (PCA).Figure 3 shows the percentage of variation that was explained by the components.It is common practice in epidemiological studies that measure the socioeconomic status of a household to use the first component to derive a desired socioeconomic index or score [2,3].Our WASH score was, therefore, based on the first principal component.The PCA score was then fitted to a linear geostatistical model [4] given below using the PrevMap package [5].
where Y (x i ) is the observed WASH score at location i, µ is the constant mean effect (intercept), Z i (∼ N (0, τ 2 )) are independently distributed Gaussian variables, and S(x i ) (∼ N (0, σ 2 ) is a zero-mean stationary and isotropic Gaussian process.
After assessing the goodness of fit of the model using a semi-variogram, a linear prediction over the whole study area was carried out.This prediction was converted to a raster and used as a covariate in the model.

Model validation plots
We fitted an inhomogeneous K-function to validate our spatial point pattern model.The list of figures below (Figures 9, 10, 11, 12, 13 and 14) show that the K-functions from the observed data mostly fell within the simulated envelope for most of the distances.This suggests that our model was a good fit for the data.A spatio-temporal point pattern process can be defined as a realization of a stochastic process whose events are countable [6].The set of events can be written as (x k , t k ) where x k ∈ ℜ 2 is the location of an event and t k ∈ ℜ + is the time at which event k occurred [7].The log-likelihood of this process for a marked scenario is given as: and λ ij (x, t) = exp α i + γ j + d (x, t) ′ β + log m ij (x, t) are intensities of the spatial and spatio-temporal point processes.In equation 3: • x k for k = 1, ..., n are locations for the observed typhoid cases at time t for a typhoid case with gender i (male or female) and age j (0-5 years, 6-17 years or 18+ years) • A is the study region and T the temporal region • λ (x, t) is the intensity of the process • α i are the intercepts for typhoid case with gender i and γ j the intercepts for a typhoid case with age j • d (x, t) is the matrix of spatial and temporal covariates (such as distance to Ndirande health clinic in meters, elevation in meters, WASH score, and season) with their associated coefficients β.
• m ij (x, t) is an offset corresponding to the population for an individual with gender i and age j at location x and time t.
Model 3 uses the same bootstrap procedure for confidence intervals that was defined in the main paper for the purely spatial model.

Model validation
Similar to the purely spatial model defined in the main paper, the spatio-temporal model can be validated using a spatio-temporal inhomogeneous K-function.The inhomogeneous spatio-temporal K-function is given as: The space-time inhomogeneous function is defined as [8] : where • u is the change in space (∥x − x ′ ∥) and v the change in time (|t − t ′ |) • (u, v) is a vector representing differences in the spatio-temporal domain • g(u, v) = λ 2 (u,v) λ(x,t)λ(x ′ ,t ′ ) A non-parametric version of equation 4 can be implemented in the stpp software.The non-parametric spatio-temporal inhomogeneous K-function for an infectious disease such as typhoid is mathematically defined as follows [7]: The parameter w kh in equation 5 denotes the spatial edge correction factor whilst n v denotes the number of (typhoid) occurrences for which t k ≤ T 1 − v, T = [T 0 , T 1 ] [7].

Figure 3 :
Figure 3: Eigen values illustrating the variance percentage explained by each component

Figure 4 :
Figure 4: Contribution of variables to the WASH score

Figure 7
Figure 7 illustrates the estimated total number of people per grid cell (population count) at 100 m resolution and the estimated population density per grid cell at 1km resolution in Ndirande in 2018.The age-gender specific population distribution plots are presented in Figure 8.

Figure 7 :
Figure 7: Map of population distribution in Ndirande in 2018.

Figure 8 :
Figure 8: Map of age and gender-specific population distribution in Ndirande in 2018.

Figure 9 :
Figure 9: Spatial inhomogeneous K-function for males aged between 0 and 5 years.The black line represents the inhomogeneous K-functions from the observed data, whilst the grey areas represent the inhomogeneous K-functions from the 10,000 realised bootstrap samples